fileformatsfandomcom-20200215-history
Delta encoding
Delta encoding is a way of storing or transmitting data in the form of differences between sequential data rather than complete files. Delta encoding is sometimes called delta compression, particularly where archival histories of changes are required. From a logical point of view the difference between two data values is the information required to obtain one value from the other. The difference between identical values (under some equivalence) is often called 0'' or the identity element. A good delta should be minimal, or ambiguous unless one element of a pair is present. Perhaps the simplest example is storing values of bytes as differences (deltas) between sequential values, rather than the values themselves. So, instead of 2, 4, 6, 9, 7, we would store 2, 2, 2, 3, -2. This is not very useful when used alone, but it can help further compression of data in which sequential values occur often. A delta can be defined in 2 ways, ''symmetric delta and directed delta. A symmetric delta can be expressed as: :Δ(v''1, ''v''2) = (''v''1 \ ''v''2) U (''v''2 \ ''v''1), where ''v''1 and ''v''2 represent two successive versions. A ''directed delta, also called a change, is a sequence of (elementary) change operations which, when applied to one version ''v''1, yields another version ''v''2 (note the correspondence to transaction logs in databases). A variation of delta encoding which encodes differences among substrings is called incremental encoding. It is particularly effective for sorted lists with small differences between strings, such as a list of words from a dictionary. In delta encoded transmission over a network where only a single copy of the file is available at each end of the communication channel special error control codes are used to detect which parts of the file has changed since its previous version. The nature of the data to be encoded influences the effectiveness of a particular compression algorithm. Delta encoding performs best when data has small or constant variation; for an unsorted data set, there may be little to no compression possible with this method. The following C code performs a simple form of delta encoding and decoding: void delta_encode(char *buffer, int length) { char t = 0; char original; int i; for (i=0; i < length; ++i) { original = bufferi; bufferi -= t; t = original; } } void delta_decode(char *buffer, int length) { char t = 0; int i; for (i=0; i < length; ++i) { bufferi += t; t = bufferi; } } Another instance of use of delta encoding is RFC 3229, "Delta encoding in HTTP", which proposes that HTTP servers should be able to send updated Web pages in the form of differences between versions (deltas), which should decrease Internet traffic, as most pages change slowly over time, rather than being completely rewritten repeatedly: This document describes how delta encoding can be supported as a compatible extension to HTTP/1.1. Many HTTP (Hypertext Transport Protocol) requests cause the retrieval of slightly modified instances of resources for which the client already has a cache entry. Research has shown that such modifying updates are frequent, and that the modifications are typically much smaller than the actual entity. In such cases, HTTP would make more efficient use of network bandwidth if it could transfer a minimal description of the changes, rather than the entire new instance of the resource. See also *String-to-string correction problem External links * RFC 3229 - Delta Encoding in HTTP * RFC 3284 - The VCDIFF Generic Differencing and Compression Data Format Category:Data compression